Trustworthy machine learning for health care:  scalable data valuation with the shapley value

Pandl, Konstantin D.; Feiland, Fabian; Thiebes, Scott; Sunyaev, Ali

doi:10.1145/3450439.3451861

Trustworthy machine learning for health care: scalable data valuation with the shapley value

Pandl, Konstantin D.

; Feiland, Fabian; Thiebes, Scott

; Sunyaev, Ali

Abstract:

Collecting data from many sources is an essential approach to generate large data sets required for the training of machine learning models. Trustworthy machine learning requires incentives, guarantees of data quality, and information privacy. Applying recent advancements in data valuation methods for machine learning can help to enable these. In this work, we analyze the suitability of three different data valuation methods for medical image classification tasks, specifically pleural effusion, on an extensive data set of chest X-ray scans. Our results reveal that a heuristic for calculating the Shapley valuation scheme based on a k-nearest neighbor classifier can successfully value large quantities of data instances. We also demonstrate possible applications for incentivizing data sharing, the efficient detection of mislabeled data, and summarizing data sets to exclude private information. Thereby, this work contributes to developing modern data infrastructures for trustworthy machine learning in health care.

Zugehörige Institution(en) am KIT	Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp	Proceedingsbeitrag
Publikationsjahr	2021
Sprache	Englisch
Identifikator	ISBN: 978-1-4503-8359-2 KITopen-ID: 1000131207
Erschienen in	CHIL '21: Proceedings of the Conference on Health, Inference, and Learning, April, 2021. Ed.: M. Ghassemi
Veranstaltung	ACM Conference on Health, Inference, and Learning (CHIL 2021), Online, 08.04.2021 – 09.04.2021
Verlag	Association for Computing Machinery (ACM)
Seiten	47–57
Vorab online veröffentlicht am	08.04.2021
Schlagwörter	Computer Vision, Data Valuation, Machine Learning, Medical Imaging, Shapley Value
Nachgewiesen in	Scopus Dimensions OpenAlex
Globale Ziele für nachhaltige Entwicklung

Externe Links

Download

Originalveröffentlichung
DOI: 10.1145/3450439.3451861

Scopus
Zitationen: 14

Dimensions
Zitationen: 10

Export

Statistiken

Seitenaufrufe: 163
seit 07.04.2021

Repository KITopen

Trustworthy machine learning for health care: scalable data valuation with the shapley value

Abstract: