KIT | KIT-Bibliothek | Impressum | Datenschutz

PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding

Hermann, Stefan 1; Lehmann, Hans-Peter ORCID iD icon 1; Pibiri, Giulio Ermanno; Sanders, Peter ORCID iD icon 1; Walzer, Stefan 1
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)

Abstract:

A minimal perfect hash function (or MPHF) maps a set of n keys to [n] : = {1, …, n} without collisions. Such functions find widespread application e.g. in bioinformatics and databases. In this paper we revisit PTHash - a construction technique particularly designed for fast queries. PTHash distributes the input keys into small buckets and, for each bucket, it searches for a hash function seed that places its keys in the output domain without collisions. The collection of all seeds is then stored in a compressed way. Since the first buckets are easier to place, buckets are considered in non-increasing order of size. Additionally, PTHash heuristically produces an imbalanced distribution of bucket sizes by distributing 60% of the keys into 30% of the buckets.
Our main contribution is to characterize, up to lower order terms, an optimal choice for the expected bucket sizes, improving construction throughput for space efficient configurations both in theory and practice. Further contributions include a new encoding scheme for seeds that works across partitions of the data structure and a GPU parallelization.
Compared to PTHash, PHOBIC is 0.17 bits/key more space efficient for same query time and construction throughput. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000174479
Veröffentlicht am 23.09.2024
Originalveröffentlichung
DOI: 10.4230/LIPIcs.ESA.2024.69
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 23.09.2024
Sprache Englisch
Identifikator ISBN: 978-3-95977-338-6
ISSN: 1868-8969
KITopen-ID: 1000174479
HGF-Programm 46.21.02 (POF IV, LK 01) Cross-Domain ATMLs and Research Groups
Erschienen in 32nd Annual European Symposium on Algorithms (ESA 2024). Ed.: T. Chan
Veranstaltung 32nd Annual European Symposium on Algorithms (ESA 2024), London, Vereinigtes Königreich, 02.09.2024 – 04.09.2024
Verlag Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI)
Serie Leibniz International Proceedings in Informatics (LIPIcs) ; 308
Schlagwörter Compressed Data Structures, Minimal Perfect Hashing, GPU, Theory of computation → Data compression, Information systems → Point lookups
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page