Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures

Damschen, Marvin

doi:10.5445/IR/1000089975

Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures

Damschen, Marvin

Abstract (englisch):

Real-time systems are ubiquitous in our everyday life, e.g., in safety-critical domains such as automotive, avionics or robotics. The correctness of a real-time system does not only depend on the correctness of its calculations, but also on the non-functional requirement of adhering to deadlines. Failing to meet a deadline may lead to severe malfunctions, therefore worst-case execution times (WCET) need to be guaranteed. Despite significant scientific advances, however, timing analysis of WCET guarantees lags years behind current high-performance microarchitectures with out-of-order scheduling pipelines, several hardware threads and multiple (shared) cache layers. To satisfy the increasing performance demands of real-time systems, analyzable performance features are required. In order to escape the scarcity of timing-analyzable performance features, the main contribution of this thesis is the introduction of runtime reconfiguration of hardware accelerators onto a field-programmable gate array (FPGA) as a novel means to achieve performance that is amenable to WCET guarantees. Instead of designing an architecture for a specific application domain, this approach preserves the flexibility of the system.
... mehr

First, this thesis contributes novel co-scheduling approaches to distribute work among CPU and GPU in an extensive analysis of how (average-case) performance is achieved on fused CPU-GPU architectures, a main trend in current high-performance microarchitectures that combines a CPU and a GPU on a single chip. Being able to employ such architectures in real-time systems would be highly desirable, because they provide high performance within a limited area and power budget. As a result of this analysis, however, a cache coherency bottleneck is uncovered in recent fused CPU-GPU architectures that share the last level cache between CPU and GPU. This insight (i) complicates performance predictions and (ii) adds a shared last level cache between CPU and GPU to the growing list of microarchitectural features that benefit average-case performance, but render the analysis of WCET guarantees on high-performance architectures virtually infeasible. Thus, further motivating the need for novel microarchitectural features that provide predictable performance and are amenable to timing analysis.

Towards this end, a runtime reconfiguration controller called ``Command-based Reconfiguration Queue'' (CoRQ) is presented that provides guaranteed latencies for its operations, especially for the reconfiguration delay, i.e., the time it takes to reconfigure a hardware accelerator onto a reconfigurable fabric (e.g., FPGA). CoRQ enables the design of timing-analyzable runtime-reconfigurable architectures that support WCET guarantees. Based on the --now feasible-- guaranteed reconfiguration delay of accelerators, a WCET analysis is introduced that enables tasks to reconfigure application-specific custom instructions (CIs) at runtime. CIs are executed by a processor pipeline and invoke execution of one or more accelerators. Different measures to deal with reconfiguration delays are compared for their impact on accelerated WCET guarantees and overestimation. The timing anomaly of runtime reconfiguration is identified and safely bounded: a case where executing iterations of a computational kernel faster than in WCET during reconfiguration of CIs can prolong the total execution time of a task. Once tasks that perform runtime reconfiguration of CIs can be analyzed for WCET guarantees, the question of which CIs to configure on a constrained reconfigurable area to optimize the WCET is raised. The question is addressed for systems where multiple CIs with different implementations each (allowing to trade-off latency and area requirements) can be selected.
This is generally the case, e.g., when employing high-level synthesis. This so-called WCET-optimizing instruction set selection problem is modeled based on the Implicit Path Enumeration Technique (IPET), which is the path analysis technique state-of-the-art timing analyzers rely on. To our knowledge, this is the first approach that enables WCET optimization with support for making use of global program flow information (and information about reconfiguration delay). An optimal algorithm (similar to Branch and Bound) and a fast greedy heuristic algorithm (that achieves the optimal solution in most cases) are presented. Finally, an approach is presented that, for the first time, combines optimized static WCET guarantees and runtime optimization of the average-case execution (maintaining WCET guarantees) using runtime reconfiguration of hardware accelerators by leveraging runtime slack (the amount of time that program parts are executed faster than in WCET). It comprises an analysis of runtime slack bounds that enable safe reconfiguration for average-case performance under WCET guarantees and presents a mechanism to monitor runtime slack using a simple performance counter that is commonly available in many microprocessors.

Ultimately, this thesis shows that runtime reconfiguration of accelerators is a key feature to achieve predictable performance.

KITopen-Download

Volltext

DOI: 10.5445/IR/1000089975

Veröffentlicht am 25.01.2019

Export

Statistiken

Seitenaufrufe: 1094
seit 28.01.2019

Downloads: 930
seit 28.01.2019

Zugehörige Institution(en) am KIT	Institut für Technische Informatik (ITEC)
Publikationstyp	Hochschulschrift
Publikationsjahr	2019
Sprache	Englisch
Identifikator	urn:nbn:de:swb:90-899755 KITopen-ID: 1000089975
Verlag	Karlsruher Institut für Technologie (KIT)
Umfang	XIII, 106 S.
Art der Arbeit	Dissertation
Fakultät	Fakultät für Informatik (INFORMATIK)
Institut	Institut für Technische Informatik (ITEC)
Prüfungsdatum	19.12.2018
Projektinformation	SFB/TRR 89/2, 146371743 (DFG, DFG KOORD, TRR 89/2 2014)
Externe Relationen	Forschungsdaten/Software Forschungsdaten/Software
Schlagwörter	Worst-Case Execution Time, WCET, Predictability, FPGA, Runtime Reconfiguration, Reconfigurable Computing, Real-Time Systems, Embedded Systems, Fused CPU-GPU Architectures, Timing Anomaly, Invasive Computing, InvasIC
Nachgewiesen in	OpenAlex
Relationen in KITopen	Verweist auf Timing Analysis of Tasks on Runtime Reconfigurable Processors. Damschen, Marvin; Bauer, Lars; Henkel, Jorg (2017) Zeitschriftenaufsatz (1000070654) Extending the WCET problem to optimize for runtime-reconfigurable processors. Damschen, M.; Bauer, L.; Henkel, J. (2016) Zeitschriftenaufsatz (1000065677) CoRQ: Enabling Runtime Reconfiguration under WCET Guarantees for Real-Time Systems. Damschen, Marvin; Bauer, Lars; Henkel, Jörg (2017) Zeitschriftenaufsatz (1000072226) Floating Point Acceleration for Stream Processing Applications in Dynamically Reconfigurable Processors. Bauer, Lars; Grudnitsky, Artjom; Damschen, Marvin; Kerekare, Srinivas Rao; Henkel, Jörg (2015) Proceedingsbeitrag (1000052816) Shallow Water Waves on a Deep Technology Stack : Accelerating a Finite Volume Tsunami Model Using Reconfigurable Hardware in Invasive Computing. Pöppl, Alexander; Damschen, Marvin; Schmaus, Florian; Fried, Andreas; Mohr, Manuel; Blankertz, Matthias; Bauer, Lars; Henkel, Jörg; Schröder-Preikschat,... (2018) Proceedingsbeitrag (1000080989) Auto-SI: An adaptive reconfigurable processor with run-time loop detection and acceleration. Harbaum, Tanja; Schade, Christoph; Damschen, Marvin; Tradowsky, Carsten; Bauer, Lars; Henkel, Jorg; Becker, Jurgen (2017) Proceedingsbeitrag (1000084897)
Globale Ziele für nachhaltige Entwicklung
Referent/Betreuer	Henkel, J.

Repository KITopen

Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures

Abstract (englisch):