KIT | KIT-Bibliothek | Impressum | Datenschutz

Decision Trees for the Imputation of Categorical Data

Rockel, Tobias; Joenssen, Dieter William; Bankhofer, Udo

Resolving the problem of missing data via imputation can theoretically be done by any prediction model. In the field of machine learning, a well known type of prediction model is a decision tree. However, the literature on how suitable a decision tree is for imputation is still scant to date. Therefore, the aim of this paper is to analyze the imputation quality of decision trees. Furthermore, we present a way to conduct a stochastic imputation using decision trees. We ran a simulation study to compare the deterministic and stochastic imputation approach using decision trees among each other and with other imputation methods. For this study, real datasets and various missing data settings are used. In addition, three different quality criteria are considered. The results of the study indicate that the choice of imputation method should be based on the intended analysis.

Volltext §
DOI: 10.5445/KSP/1000058749/14
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2017
Sprache Englisch
Identifikator ISSN: 2363-9881
KITopen-ID: 1000068770
Erschienen in Archives of Data Science, Series A (Online First)
Band 2
Heft 1
Seiten 15 S. online
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page