KIT | KIT-Bibliothek | Impressum | Datenschutz

Surrogate Modeling for Scalable Evaluation of Distributed Computing Systems for HEP Applications

Schmid, Larissa ORCID iD icon 1; Horzela, Maximilian M. ORCID iD icon 2; Zhyla, Valerii 1; Giffels, Manuel ORCID iD icon 2; Quast, Günter ORCID iD icon 2; Koziolek, Anne ORCID iD icon 1; Szumlak, T. [Hrsg.]; Rachwał, B. [Hrsg.]; Dziurda, A. [Hrsg.]; Schulz, M. [Hrsg.]; vom Bruch, D. [Hrsg.]; Ellis, K. [Hrsg.]; Hageboeck, S. [Hrsg.]
1 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)
2 Institut für Experimentelle Teilchenphysik (ETP), Karlsruher Institut für Technologie (KIT)

Abstract:

The Worldwide LHC Computing Grid (WLCG) provides the ro-
bust computing infrastructure essential for the LHC experiments by integrating global computing resources into a cohesive entity. Simulations of different compute models present a feasible approach for evaluating future adaptations that are able to cope with future increased demands. However, running these simulations incurs a trade-off between accuracy and scalability. For example, while the simulator DCSim can provide accurate results, it falls short on scaling with
the size of the simulated platform. Using Generative Machine Learning as a surrogate presents a candidate for overcoming this challenge.
In this work, we evaluate the usage of three different Machine Learning models for the simulation of distributed computing systems and assess their ability to generalize to unseen situations. We show that those models can predict central observables derived from execution traces of compute jobs with approximate accuracy but with orders of magnitude faster execution times. Furthermore, we identify potentials for improving the predictions towards better accuracy and generalizability.


Verlagsausgabe §
DOI: 10.5445/IR/1000188060
Veröffentlicht am 05.12.2025
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Experimentelle Teilchenphysik (ETP)
Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2025
Sprache Englisch
Identifikator ISSN: 2100-014X
KITopen-ID: 1000188060
Erschienen in EPJ Web of Conferences
Verlag EDP Sciences
Band 337
Seiten 01130
Bemerkung zur Veröffentlichung 27th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024), Krakaw, 21st-25th October 2024
Vorab online veröffentlicht am 07.10.2025
Nachgewiesen in Dimensions
OpenAlex
Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page