What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images

Rädsch, Tim; Eckhardt, Sven; Leiser, Florian; Pandl, Konstantin D.; Thiebes, Scott; Sunyaev, Ali

What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images

Rädsch, Tim; Eckhardt, Sven; Leiser, Florian

; Pandl, Konstantin D.

; Thiebes, Scott

; Sunyaev, Ali

Abstract:

Label quality is an important and common problem in contemporary supervised machine learning research. Mislabeled instances in a data set might not only impact the performance of machine learning models negatively but also make it more difficult to explain, and thus trust, the predictions of those models. While extant research has especially focused on the ex-ante improvement of label quality by proposing improvements to the labeling process, more recent research has started to investigate the use of machine learning-based approaches to identify mislabeled instances in training data sets automatically. In this study, we propose a two-staged pipeline for the automatic detection of potentially mislabeled instances in a large medical data set. Our results show that our pipeline successfully detects mislabeled instances, helping us to identify 7.4% of mislabeled instances of Cardiomegaly in the data set. With our research, we contribute to ongoing efforts regarding data quality in machine learning.

Zugehörige Institution(en) am KIT	Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp	Proceedingsbeitrag
Publikationsdatum	05.01.2021
Sprache	Englisch
Identifikator	KITopen-ID: 1000127578
Erschienen in	Proceedings of the 54th Hawaii International Conference on System Sciences (HICSS)
Veranstaltung	54th Hawaii International Conference on System Sciences (HICSS 2021), Online, 05.01.2021 – 08.01.2021

KITopen-Download

Postprint

DOI: 10.5445/IR/1000127578

Veröffentlicht am 06.01.2022

Export

Statistiken

Seitenaufrufe: 272
seit 11.12.2020

Downloads: 374
seit 07.01.2022

Repository KITopen

What Your Radiologist Might be Missing: Using Machine Learning to Identify Mislabeled Instances of X-ray Images

Abstract: