TEE-Based Distributed Ledgers and Their Resilience
Leinweber, Marc 1 1 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)
Abstract:
Resilience is the ability of a (distributed) system to withstand any stressful situation without imposing massive restrictions and, above all, without long-term consequences. Permissioned distributed ledgers based on state machine replication (SMR) offer a promising approach to achieving high resilience and fairness in federated systems. SMR provides a fault-tolerant service for clients by relying on all replicas being in a consistent state. The consistent state is achieved through a consensus algorithm, typically an atomic broadcast, that decides on a total order of client requests. In the Byzantine fault model, replicas are assumed to be potentially malicious; a Byzantine fault-tolerant (BFT) protocol withstands a fixed share of malicious actors. Classic BFT SMR protocols require $n>3t$ replicas and multiple rounds of communication to withstand $t$ faulty replicas, making the implementation complex and limiting achievable throughput and increasing latency. Trusted Execution Environments (TEEs) allow to implement SMR in the so-called hybrid fault model in which replicas are assumed to be potentially Byzantine but the TEE is restricted to only fail by crashing. ... mehrIn the hybrid fault model, SMR requires less communication and can be implemented with a fault tolerance of $n>2t$ replicas. While many proposals aim to optimize BFT SMR by using TEEs, they still rely on a so-called leader that coordinates the agreement process among the replicas. The leader is known to be a bottleneck and, if it fails, the system has to recover from the failure and elect a new leader. The additional coordination required to elect a new leader can cause significant performance degradation, limiting the achieved resilience. Asynchronous protocols based on directed acyclic graphs (DAGs) eliminate the reliance on distinguished replicas by allowing all replicas to participate equally in the agreement process. While asynchronous approaches and the hybrid fault model independently contribute to increasing the resilience of BFT SMR systems, their combination has largely been unexplored. This dissertation aims to fill this gap by answering the following research question:
What is the achievable performance and resilience of DAG-based, hybrid fault-tolerant state machine replication and under which preconditions can the leaderless nature be safely exploited to maximize throughput?
We proceed in three steps to enhance the resilience and performance of BFT SMR systems and to identify potential trade-offs that arise from the assumption of TEEs and asynchrony in BFT SMR. First, we investigate the fit of TEE-based SMR for consortium-operated applications using the example of Mobility-as-a-Service ticketing systems. We propose an SMR application that uses TEEs to protect sensitive customer and mobility provider data while limiting possibilities for fraud by both customers and mobility providers, and ensuring correct billing. We find that as long as secure multiparty computation is not competitive in terms of performance, TEE-based SMR can provide significant advantages in terms of efficiency and resilience while providing reasonable confidentiality guarantees. We describe the characteristics of the Mobility-as-a-Service use case and identify similar use cases from other domains, e.g., central bank digital currencies, allowing us to conclude that our findings generalize.
In the second step, we establish the foundation for a comprehensive analysis by proposing and proving TEE-Rider, the first hybrid fault-tolerant, asynchronous, and DAG-based atomic broadcast protocol. TEE-Rider builds upon the DAG-Rider protocol family and an optimized, DAG-aware, and TEE-based causal order broadcast we propose and prove. We then identify fundamental issues that arise from the combination of TEEs and asynchrony in BFT SMR. These are the impossibility of a fault-tolerant setup and the impossibility of garbage collection. Furthermore, we prove that for partially synchronous, TEE-based reliable broadcast it is impossible to reinitialize a TEE after a crash without relying on the participation of all $n$ replicas. We conclude the theoretical contributions with the proposal of the NxBFT SMR framework. Following an assumption-algorithm co-design, NxBFT is built upon TEE-Rider for the "Not eXactly Byzantine" (NxB) operating model to maximize throughput without sacrificing resilience. Moreover, NxBFT leverages SMR state transfer to circumvent the limitations imposed by TEEs and asynchrony and provides, under the assumption of partial synchrony, garbage collection, recovery, and reconfiguration.
Finally, we contribute an extensive empirical evaluation. To this end, we develop the ABCperf evaluation framework focusing on the fair and straightforward comparison of fault-tolerant SMR and agreement protocols. We investigate the performance characteristics of NxBFT and find that cryptographic operations for signature creation and verification are the main bottleneck. We compare the performance of NxBFT with the state-of-the-art leader-based, hybrid fault-tolerant protocols MinBFT and Chained-Damysus and investigate the impact of the SMR client model (BFT vs. NxB), payload sizes, network sizes, network latencies, and crash faults. While all algorithms can benefit from the NxB client model, NxBFT achieves the highest throughput in all scenarios with up to $\sim500\,000$ requests per second. All algorithms show an improvement of the end-to-end latency when using the BFT instead of the NxB client model. When small latencies are required, MinBFT and Damysus are at an advantage with Damysus showing competitive throughput and impressively low latencies for small deployments. In contrast to leader-based approaches, NxBFT's performance is almost not impacted when actual crash faults occur.
Institut für Informationssicherheit und Verlässlichkeit (KASTEL) Kompetenzzentrum für angewandte Sicherheitstechnologie (KASTEL)
Publikationstyp
Hochschulschrift
Publikationsdatum
19.01.2026
Sprache
Englisch
Identifikator
KITopen-ID: 1000189670
HGF-Programm
46.23.01 (POF IV, LK 01) Methods for Engineering Secure Systems
Weitere HGF-Programme
46.23.03 (POF IV, LK 01) Engineering Security for Mobility Systems
Verlag
Karlsruher Institut für Technologie (KIT)
Umfang
ix, 201 S.
Art der Arbeit
Dissertation
Fakultät
Fakultät für Informatik (INFORMATIK)
Institut
Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Prüfungsdatum
05.11.2025
Projektinformation
KASTEL_SVI (BMFTR, 16KIS0521)
Schlagwörter
Distributed Ledger Technology; Trusted Execution Environments; State Machine Replication; Distributed Systems Security; Security Evaluation; Performance Evaluation; Mobility-as-a-Service; Public IT Federations
Globale Ziele für nachhaltige Entwicklung
Referent/Betreuer
Hartenstein, Hannes Kapitza, Rüdiger
KIT – Die Universität in der Helmholtz-Gemeinschaft