KIT | KIT-Bibliothek | Impressum | Datenschutz

Learned Monotone Minimal Perfect Hashing

Ferragina, Paolo; Lehmann, Hans-Peter ORCID iD icon 1; Sanders, Peter ORCID iD icon 2; Vinciguerra, Giorgio
1 Institut für Theoretische Informatik (ITI), Karlsruher Institut für Technologie (KIT)
2 Karlsruher Institut für Technologie (KIT)

Abstract:

A Monotone Minimal Perfect Hash Function (MMPHF) constructed on a set S of keys is a function that maps each key in S to its rank. On keys not in S, the function returns an arbitrary value. Applications range from databases, search engines, data encryption, to pattern-matching algorithms.
In this paper, we describe LeMonHash, a new technique for constructing MMPHFs for integers. The core idea of LeMonHash is surprisingly simple and effective: we learn a monotone mapping from keys to their rank via an error-bounded piecewise linear model (the PGM-index), and then we solve the collisions that might arise among keys mapping to the same rank estimate by associating small integers with them in a retrieval data structure (BuRR). On synthetic random datasets, LeMonHash needs 34% less space than the next larger competitor, while achieving about 16 times faster queries. On real-world datasets, the space usage is very close to or much better than the best competitors, while achieving up to 19 times faster queries than the next larger competitor. As far as the construction of LeMonHash is concerned, we get an improvement by a factor of up to 2, compared to the competitor with the next best space usage.
... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000162137
Veröffentlicht am 13.09.2023
Originalveröffentlichung
DOI: 10.4230/LIPIcs.ESA.2023.46
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2023
Sprache Englisch
Identifikator ISBN: 978-3-95977-295-2
ISSN: 1868-8969
KITopen-ID: 1000162137
Erschienen in 31st Annual European Symposium on Algorithms (ESA 2023), Schloss Dagstuhl - Leibniz-Zentrum für Informatik. Ed.: I. Gørtz
Veranstaltung 31st 31st Annual European Symposium on Algorithms (Part of ALGO 2023) (ESA 2023), Amsterdam, Niederlande, 04.09.2023 – 08.09.2023
Verlag Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI)
Seiten 46:1-46:17
Serie Leibniz international proceedings in informatics : LIPIcs / Schloss Dagstuhl Leibniz-Zentrum für Informatik ; 274
Schlagwörter compressed data structure, monotone minimal perfect hashing, retrieval, Theory of computation → Data compression, Information systems → Point lookups
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page