KIT | KIT-Bibliothek | Impressum | Datenschutz

Sassy: fuzzy searching DNA sequences using SIMD

Beeloo, Rick ; Groot Koerkamp, Ragnar 1; Birol, Inanc [Hrsg.]
1 Institut für Angewandte Informatik (IAI), Karlsruher Institut für Technologie (KIT)

Abstract:

$\textbf{Motivation:}$ Approximate string matching (ASM) is the problem of finding all occurrences of a pattern in a text while allowing up to k errors. Many modern methods use seed-chain-extend, which is fast in practice, but does not guarantee finding all matches with ≤k errors. However, applications such as CRISPR off-target detection require exhaustive results.
$\textbf{Results:}$ We introduce Sassy, a library and tool for ASM of short patterns in long texts. Sassy splits the text into four parts that are searched in parallel, and uses bitvectors in the text direction rather than the pattern direction. This has complexity Oðkdn=WeÞ when searching a random text of length n, where W=256 is the SIMD width, and provides significant speedups for small k. Separately, we allow matches of the pattern to extend beyond the text for an overhang cost of, e.g. α=0:5 per character, to find matches near contig or read ends.
Sassy is 4× to 15× faster than Edlib for patterns ≤1000bp, and can search text with a throughput near 2Gbp/s. Likewise, Sassy is over 100× faster than parasail. We apply Sassy to CRISPR off-target detection by searching 61 guide sequences in a human genome. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000193772
Veröffentlicht am 09.06.2026
Originalveröffentlichung
DOI: 10.1093/bioinformatics/btag244
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik (IAI)
Publikationstyp Zeitschriftenaufsatz
Publikationsdatum 03.05.2026
Sprache Englisch
Identifikator ISSN: 1367-4803, 0266-7061, 1367-4811, 1460-2059
KITopen-ID: 1000193772
Erschienen in Bioinformatics
Verlag Oxford University Press (OUP)
Band 42
Heft 5
Seiten 1
Externe Relationen Siehe auch
Nachgewiesen in OpenAlex
Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page