KIT | KIT-Bibliothek | Impressum | Datenschutz

Privacy and Utility of Private Synthetic Data for Medical Data Analyses

Appenzeller, Arno ORCID iD icon 1,2; Leitner, Moritz 1,2; Philipp, Patrick 2; Krempel, Erik; Beyerer, Jürgen 1,3
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)
2 Fraunhofer-Institut für Optronik, Systemtechnik und Bildauswertung (IOSB)
3 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)

Abstract:

The increasing availability and use of sensitive personal data raises a set of issues regarding the privacy of the individuals behind the data. These concerns become even more important when health data are processed, as are considered sensitive (according to most global regulations). Privacy Enhancing Technologies (PETs) attempt to protect the privacy of individuals whilst preserving the utility of data. One of the most popular technologies recently is Differential Privacy (DP), which was used for the 2020 U.S. Census. Another trend is to combine synthetic data generators with DP to create so-called private synthetic data generators. The objective is to preserve statistical properties as accurately as possible, while the generated data should be as different as possible compared to the original data regarding private features. While these technologies seem promising, there is a gap between academic research on DP and synthetic data and the practical application and evaluation of these techniques for real-world use cases. In this paper, we evaluate three different private synthetic data generators (MWEM, DP-CTGAN, and PATE-CTGAN) on their use-case-specific privacy and utility. ... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000153458
Veröffentlicht am 05.12.2022
Originalveröffentlichung
DOI: 10.3390/app122312320
Scopus
Zitationen: 10
Web of Science
Zitationen: 7
Dimensions
Zitationen: 13
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Publikationstyp Zeitschriftenaufsatz
Publikationsjahr 2022
Sprache Englisch
Identifikator ISSN: 2076-3417
KITopen-ID: 1000153458
HGF-Programm 46.23.04 (POF IV, LK 01) Engineering Security for Production Systems
Erschienen in Applied Sciences
Verlag MDPI
Band 12
Heft 23
Seiten Article no: 12320
Vorab online veröffentlicht am 01.12.2022
Schlagwörter synthetic data generation, differential privacy, secondary use, medical data, private data processing, open source framework
Nachgewiesen in Dimensions
Web of Science
Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page