Tier 3 batch system data locality via managed caches

Fischer, M.; Giffels, M.; Jung, C.; Kühn, E.; Quast, G.

Modern data processing increasingly relies on data locality for performance and scalability, whereas the common HEP approaches aim for uniform resource pools with minimal locality, recently even across site boundaries. To combine advantages of both, the High-Performance Data Analysis (HPDA) Tier 3 concept opportunistically establishes data locality via coordinated caches. In accordance with HEP Tier 3 activities, the design incorporates two major assumptions: First, only a fraction of data is accessed regularly and thus the deciding factor for overall throughput. Second, data access may fallback to non-local, making permanent local data availability an ineficient resource usage strategy. Based on this, the HPDA design generically extends available storage hierarchies into the batch system. Using the batch system itself for scheduling file locality, an array of independent caches on the worker nodes is dynamically populated with high-profile data. Cache state information is exposed to the batch system both for managing caches and scheduling jobs. As a result, users directly work with a regular, adequately sized storage system. However, their automated batch processes are presented with
Verlagsausgabe §
DOI: 10.5445/IR/110103470
Veröffentlicht am 05.08.2019
DOI: 10.1088/1742-6596/608/1/012018
Zitationen: 4
Zugehörige Institution(en) am KIT Steinbuch Centre for Computing (SCC)
Institut für Kernphysik (IKP)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2015
Sprache Englisch
Identifikator ISSN: 1742-6588, 1742-6596
KITopen-ID: 110103470
HGF-Programm 53.52.02 (POF III, LK 02)
Erschienen in Journal of physics / Conference Series
Band 608
Heft 1
Seiten Art.Nr. 012018/1-5
Nachgewiesen in Scopus
