URN: urn:nbn:de:swb:90-AAA1101043589
DOI: 10.1088/1742-6596/664/5/052019

Active job monitoring in pilots

Kuehn, E.; Fischer, M.; Giffels, M.; Jung, C.; Petzold, A.

Recent developments in high energy physics (HEP) including multi-core jobs and multi-core pilots require data centres to gain a deep understanding of the system to monitor, design, and upgrade computing clusters. Networking is a critical component. Especially the increased usage of data federations, for example in diskless computing centres or as a fall-back solution, relies on WAN connectivity and availability. The specific demands of different experiments and communities, but also the need for identification of misbehaving batch jobs, requires an active monitoring. Existing monitoring tools are not capable of measuring fine-grained information at batch job level. This complicates network-aware scheduling and optimisations. In addition, pilots add another layer of abstraction. They behave like batch systems themselves by managing and executing payloads of jobs internally. The number of real jobs being executed is unknown, as the original batch system has no access to internal information about the scheduling process inside the pilots. Therefore, the comparability of jobs and pilots for predicting run-time behaviour or network perfo ... mehr

Zugehörige Institution(en) am KIT Steinbuch Centre for Computing (SCC)
Institut für Kernphysik (IKP)
Publikationstyp Zeitschriftenaufsatz
Jahr 2015
Sprache Englisch
Identifikator ISSN: 1742-6588, 1742-6596
KITopen ID: 110104358
HGF-Programm 51.01.01; LK 01
Erschienen in Journal of physics / Conference Series
Band 664
Heft 5
Seiten 052019/1-8
