Towards Automatically Refining Low-Quality Domain Knowledge: A Case Study in Healthcare

Bielski, Pawel ORCID iD icon 1; Jendral, Sönke; Witterauf, Lena; Bach, Jakob ORCID iD icon 1
1 Institut für Programmstrukturen und Datenorganisation (IPD), Karlsruher Institut für Technologie (KIT)


Machine learning is an effective tool for diagnosis prediction on electronic health records (EHRs). To target the challenge of insufficient data for rare diseases, researchers have employed Domain Knowledge Guided Machine Learning (DKGML) to incorporate domain knowledge from medical taxonomies. However, existing research on DKGML in healthcare assumes a high quality of domain knowledge and its solely positive impact on predictions. It is unclear whether these assumptions always hold and what to do if not. To address this gap, we define low-quality domain knowledge in the particular context of DKGML. Based on this definition, we propose methods to detect and refine low-quality domain knowledge. Preliminary results suggest that our approach improves the prediction of rare diseases on the MIMIC-III dataset.

