KIT | KIT-Bibliothek | Impressum | Datenschutz

A Survey on Metadata for Machine Learning Models and Datasets: Standards, Practices, and Harmonization Challenges

Gesese, Genet Asefa ORCID iD icon 1; Chen, Zongxiong; Zoubia, Oussama; Limani, Fidan; Silva, Kanishka; Suryani, Muhammad Asif; Zapilko, Benjamin; Castro, Leyla Jael Garcia; Kutafina, Ekaterina V.; Solanki, Dhwani; Fliegl, Heike 2; Schimmler, Sonja; Boukhers, Zeyd; Sack, Harald 1
1 Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB), Karlsruher Institut für Technologie (KIT)
2 Institut für Nanotechnologie (INT), Karlsruher Institut für Technologie (KIT)

Abstract:

The growing availability of machine learning (ML) models, datasets, and related artifacts across platforms,
such as Hugging Face, GitHub, and Zenodo, has amplified the need for structured and standardized metadata.
However, metadata practices remain highly heterogeneous, differing in schema design, vocabulary usage, and
semantic expressiveness, posing significant challenges for tasks such as representation, extraction, alignment, and
integration. This fragmentation impedes the development of infrastructures that depend on machine-actionable
metadata to support discovery, provenance tracking, or cross-platform interoperability. While metadata is also
foundational to enabling FAIR (Findable, Accessible, Interoperable, and Reusable) principles in ML, there is a lack
of consolidated understanding of how existing standards support interoperability and alignment across platforms.
In this survey, we review and compare a range of general-purpose and ML-specific metadata standards, evaluating
their suitability for cross-platform alignment, discoverability, extensibility, and interoperability. We assess these
standards based on defined criteria and analyze their potential to support unified, FAIR-compliant metadata
... mehr


Verlagsausgabe §
DOI: 10.5445/IR/1000188068
Veröffentlicht am 05.12.2025
Scopus
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2025
Sprache Englisch
Identifikator ISSN: 1613-0073
KITopen-ID: 1000188068
Erschienen in Sci-K 2025 Scientific Knowledge: Representation, Discovery, and Assessment 2025: Proceedings of the 5th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment co-located with 24th International International Semantic Web Conference (ISWC 2025); Nara, Japan, November 2, 2025
Veranstaltung 5th International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment @ ISWC (SCI-K@ISWC 2025), Nara, Japan, 02.11.2025
Verlag CEUR-WS
Seiten 57 - 71
Serie Proceedings ; 4065
Vorab online veröffentlicht am 13.10.2025
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page