KIT | KIT-Bibliothek | Impressum

Decision Trees for the Imputation of Categorical Data

Rockel, Tobias; Joenssen, Dieter William; Bankhofer, Udo

Resolving the problem of missing data via imputation can theoretically be done by any prediction model. In the field of machine learning, a well known type of prediction model is a decision tree. However, the literature on how suitable a decision tree is for imputation is still scant to date. Therefore, the aim of this paper is to analyze the imputation quality of decision trees. Furthermore, we present a way to conduct a stochastic imputation using decision trees. We ran a simulation study to compare the deterministic and stochastic imputation approach using decision trees among each other and with other imputation methods. For this study, real datasets and various missing data settings are used. In addition, three different quality criteria are considered. The results of the study indicate that the choice of imputation method should be based on the intended analysis.

Zugehörige Institution(en) am KIT Institut für Informationswirtschaft und Marketing (IISM)
Publikationstyp Zeitschriftenaufsatz
Jahr 2017
Sprache Englisch
Identifikator DOI: 10.5445/KSP/1000058749/14
ISSN: 2363-9881
URN: urn:nbn:de:swb:90-687708
KITopen ID: 1000068770
Erschienen in Archives of Data Science, Series A (Online First)
Band 2
Heft 1
Seiten 15 S. online
Lizenz CC BY-SA 4.0: Creative Commons Namensnennung – Weitergabe unter gleichen Bedingungen 4.0 International
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft KITopen Landing Page