Effective Parameter-free Boolean Instance Matching

Ma, Yongtao; Tran, Thanh

Abstract:

Instance matching is an important step in data integration where the goal is to find instance representations referring to the same real-world thing. State-of-the-art methods use training data to learn combinations of attributes, similarity functions and thresholds, called instance matching rules, for finding matches. The learning of complex rules with thresh- olds is however complex and thus, very sensitive to training data and parameters. In this paper, we explore a different avenue, proposing an approach that does not use thresholds but more simple boolean similarity functions. We show that the simple boolean nature of the employed rules allows for a parameter-free learning approach. For high effectiveness, we propose to incorporate fine-grained word-level evidences into rule learning. That is, instead of capturing the similarity of entire attribute values in the rules, our approach employes words extracted from attribute values. Using benchmark matching tasks, we show the proposed solution greatly out- performs state-of-the-art approaches in terms of result quality and most importantly, is not sensitive to the choice of training data and parameters.

Zugehörige Institution(en) am KIT	Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp	Forschungsbericht/Preprint
Publikationsjahr	2013
Sprache	Englisch
Identifikator	KITopen-ID: 1000091519
Verlag	Karlsruher Institut für Technologie (KIT)

Repository KITopen

Effective Parameter-free Boolean Instance Matching

Abstract: