KIT | KIT-Bibliothek | Impressum | Datenschutz

FAIR DO Application Case for Composing Machine Learning Training Data

Blumenröhr, Nicolas ORCID iD icon 1; Jejkal, Thomas ORCID iD icon 1; Pfeil, Andreas ORCID iD icon 1; Stotzka, Rainer ORCID iD icon 1
1 Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)

Abstract:

The application case for implementing and using the FAIR Digital Object (FAIR DO) concept aims to simplify usage of label information for composing Machine Learning (ML) training data.
Image data sets curated by different domain experts usually have non-identical label terms. This prevents images with similar labels from being easily assigned to the same category. Therefore, using the images collectively for application as training data in ML comes with the cost of laborious relabeling. To automate this process, machine-actionable decisions for label information must be enabled. For this purpose the FAIR DO concept is used. A FAIR DO is a representation of scientific data and requires at least a globally unique Persistent Identifier (PID), relevant metadata, and a type.
We show the requirements for specifying and using FAIR DOs when applied to ML data. Based on an application case with Scanning Electron Microscopy (SEM) image data, a Proof-of-Principle approach shows the potential of the concept for usage in ML related data management.
This work has been supported by the research program ‘Engineering Digital Futures’ of the Helmholtz Association of German Research Centers and the Helmholtz Metadata Collaboration (HMC) Platform.


Volltext §
DOI: 10.5445/IR/1000155008
Veröffentlicht am 23.01.2023
Originalveröffentlichung
DOI: 10.5281/zenodo.7243865
Cover der Publikation
Zugehörige Institution(en) am KIT Scientific Computing Center (SCC)
Publikationstyp Poster
Publikationsdatum 05.10.2022
Sprache Englisch
Identifikator KITopen-ID: 1000155008
HGF-Programm 46.21.05 (POF IV, LK 01) HMC
Veranstaltung 1st Helmholtz Metadata Collaboration Konferenz 2023 (HMC 2022), Online, 05.10.2022 – 06.10.2022
Schlagwörter FAIR Digital Objects, Metadata, Machine Learning
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page