KIT | KIT-Bibliothek | Impressum | Datenschutz

Fast and Lightweight Distributed Suffix Array Construction

Haag, Manuel 1; Kurpicz, Florian 2; Sanders, Peter ORCID iD icon 2; Schimek, Matthias ORCID iD icon 2; Benoit, Anne [Hrsg.]; Kaplan, Haim [Hrsg.]; Wild, Sebastian [Hrsg.]; Herman, Grzegorz [Hrsg.]
1 Karlsruher Institut für Technologie (KIT)
2 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

The suffix array contains the lexicographical order of all suffixes of a text. It is one of the most well-studied text indices with applications in bioinformatics, compression, and pattern matching. The main bottleneck of distributed-memory suffix array construction algorithms is their memory requirements. Even careful implementations require 30×-60× the input size as working memory. We present a scalable and lightweight distributed-memory adaptation of the difference cover (DCX) suffix array construction algorithm. Our approach relies on novel bucketing and random chunk redistribution techniques which reduce our memory requirement to 20×-26× the input size for medium-sized inputs and to 14×-15× for large-sized inputs. Regarding running time, we achieve speedups of up to 5× over current state-of-the-art distributed suffix array construction algorithms.


Verlagsausgabe §
DOI: 10.5445/IR/1000187240
Veröffentlicht am 20.11.2025
Originalveröffentlichung
DOI: 10.4230/lipics.esa.2025.47
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 01.10.2025
Sprache Englisch
Identifikator ISSN: 1868-8969
KITopen-ID: 1000187240
Erschienen in 33rd Annual European Symposium on Algorithms (ESA 2025), Warsaw, 15th-17th September 2025
Veranstaltung 33rd Annual European Symposium on Algorithms (ESA 2025), Warschau, Polen, 15.09.2025 – 17.09.2025
Verlag Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI)
Seiten 47:1-47:18
Serie Leibniz International Proceedings in Informatics (LIPIcs) ; 351
Schlagwörter Distributed Computing, Suffix Array Construction, Theory of computation → Distributed algorithms
Nachgewiesen in OpenAlex
Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page