Improved Fast Similarity Search in Dictionaries

Karch, Daniel; Luxen, Dennis; Sanders, Peter

Improved Fast Similarity Search in Dictionaries

Karch, Daniel; Luxen, Dennis; Sanders, Peter

Abstract:

We engineer an algorithm to solve the approximate dictionary matching problem.
Given a list of words $\mathcal{W}$, maximum distance $d$ fixed at preprocessing time and a query word $q$, we would like to retrieve all words from $\mathcal{W}$ that can be transformed into $q$ with $d$ or less edit operations.
We present data structures that support fault tolerant queries by generating an index.
On top of that, we present a generalization of the method that eases memory consumption and preprocessing time significantly.
At the same time, running times of queries are virtually unaffected.
We are able to match in lists of hundreds of thousands of words and beyond within microseconds for reasonable distances.

KITopen-Download

Volltext

DOI: 10.5445/IR/1000020378

Export

Statistiken

Seitenaufrufe: 442
seit 04.09.2018

Downloads: 1486
seit 09.12.2011

Zugehörige Institution(en) am KIT	Institut für Theoretische Informatik (ITI)
Publikationstyp	Buch
Publikationsjahr	2010
Sprache	Englisch
Identifikator	urn:nbn:de:swb:90-203781 KITopen-ID: 1000020378

Repository KITopen

Improved Fast Similarity Search in Dictionaries

Abstract: