Learning Rules for Effective Almost-parameter-free Instance matching

Ma, Yongtao; Tran, Thanh

Abstract:

Instance matching is an important step in data integration where the goal is to find instance representations referring to the same entity. In this paper, we propose an efficient approach to learn attributes, similarity functions, and thresh- olds, called instance-matching rules, for finding matches. Existing rule-based approaches calculate similarity of each attribute separately, and identify an instance pair as a match if each of the similarities is high enough. They may fail to identify matching instance pairs if there are errors occur in a single attribute. Besides, these approach cannot effectively learn the rules without the fine-tuning of parameters. At mean while, these approaches are also expensive in learning, because they learn the best rule from a large number of candidates whose number depends on the number of attributes, similarity functions, and especially training examples. In this paper, we address these three problems. We measure two instances as a whole by calculating the average similarity of a set of attributes to balance the errors in single one. The approach we proposed in this paper is almost free of parameters, which can easily estimate the value of the parameters from the training data and require not fine-tuning of them. ... mehr

Zugehörige Institution(en) am KIT	Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp	Forschungsbericht/Preprint
Publikationsjahr	2013
Sprache	Englisch
Identifikator	KITopen-ID: 1000091518
Verlag	Karlsruher Institut für Technologie (KIT)

Repository KITopen

Learning Rules for Effective Almost-parameter-free Instance matching

Abstract: