COBS: A Compact Bit-Sliced Signature Index

Bingmann, Timo; Bradley, P.; Gauger, Florian; Iqbal, Z.

We present COBS, a COmpact Bit-sliced Signature index, which is a cross-over between an inverted index and Bloom filters. Our target application is to index k-mers of DNA samples or q-grams from text documents and process approximate pattern matching queries on the corpus with a user-chosen coverage threshold. Query results may contain a number of false positives which decreases exponentially with the query length. We compare COBS to seven other index software packages on 100000 microbial DNA samples. COBS' compact but simple data structure outperforms the other indexes in construction time and query performance with Mantis by Pandey et al. in second place. However, unlike Mantis and other previous work, COBS does not need the complete index in RAM and is thus designed to scale to larger document sets.

DOI: 10.1007/978-3-030-32686-9_21
Zitationen: 5
Zitationen: 19
Zugehörige Institution(en) am KIT Institut für Theoretische Informatik (ITI)
Publikationstyp Buchaufsatz
Publikationsjahr 2019
Sprache Englisch
Identifikator ISBN: 978-3-030-32685-2
ISSN: 0302-9743
KITopen-ID: 1000098817
Erschienen in String Processing and Information Retrieval – 26th International Symposium, SPIRE 2019, Segovia, Spain, October 7–9, 2019, Proceedings. Ed.: N. Brisaboa
Auflage 1st ed.
Verlag Springer
Seiten 285–303
Serie Lecture Notes in Computer Science ; 11811
Vorab online veröffentlicht am 03.10.2019
Nachgewiesen in Dimensions
