KIT | KIT-Bibliothek | Impressum | Datenschutz

Faster Block Tree Construction

Köppl, Dominik ; Kurpicz, Florian 1; Meyer, Daniel 2
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)
2 Karlsruher Institut für Technologie (KIT)

Abstract:

The block tree [Belazzougui et al. J. Comput. Syst. Sci. '21] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(zlog(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions.
To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000162272
Veröffentlicht am 15.09.2023
Originalveröffentlichung
DOI: 10.4230/LIPIcs.ESA.2023.74
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2023
Sprache Englisch
Identifikator KITopen-ID: 1000162272
Erschienen in 31st Annual European Symposium on Algorithms (ESA 2023). Ed.: I.L. Gørtz
Veranstaltung 31st 31st Annual European Symposium on Algorithms (Part of ALGO 2023) (ESA 2023), Amsterdam, Niederlande, 04.09.2023 – 08.09.2023
Verlag Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI)
Schlagwörter compressed data structure; block tree; Lempel-Ziv compression; longest previous factor array; rank and select; Theory of computation → Data compression; Theory of computation → Pattern matching
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page