KIT | KIT-Bibliothek | Impressum | Datenschutz

Learning Rules for Effective Almost-parameter-free Instance matching

Ma, Yongtao; Tran, Thanh


Instance matching is an important step in data integration where the goal is to find instance representations referring to the same entity. In this paper, we propose an efficient approach to learn attributes, similarity functions, and thresh- olds, called instance-matching rules, for finding matches. Existing rule-based approaches calculate similarity of each attribute separately, and identify an instance pair as a match if each of the similarities is high enough. They may fail to identify matching instance pairs if there are errors occur in a single attribute. Besides, these approach cannot effectively learn the rules without the fine-tuning of parameters. At mean while, these approaches are also expensive in learning, because they learn the best rule from a large number of candidates whose number depends on the number of attributes, similarity functions, and especially training examples. In this paper, we address these three problems. We measure two instances as a whole by calculating the average similarity of a set of attributes to balance the errors in single one. The approach we proposed in this paper is almost free of parameters, which can easily estimate the value of the parameters from the training data and require not fine-tuning of them. ... mehr

Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2013
Sprache Englisch
Identifikator KITopen-ID: 1000091518
Verlag Karlsruher Institut für Technologie (KIT)
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page