A Gold Standard Benchmark Dataset for Digital Humanities

Kraus, Felix ORCID iD icon 1; Blumenröhr, Nicolas ORCID iD icon 1; Götzelmann, Germaine ORCID iD icon 1; Tonne, Danah ORCID iD icon 1; Streit, Achim ORCID iD icon 1
1 Scientific Computing Center (SCC), Karlsruher Institut für Technologie (KIT)


We present a benchmark dataset specifically designed to evaluate matching systems using controlled vocabularies from the digital humanities (DH). This dataset includes manually compiled gold standard alignments for eight DH test cases, addressing DH-specific challenges such as multilingualism, specialized terminology, and the use of SKOS (Simple Knowledge Organization System) as a data model. The dataset, including the reference, is publicly and persistently available and incorporated into the OAEI 2024.

To obtain a high-quality dataset, we developed requirements including criteria for resource selection and present their practical implementation. By focusing on test cases that closely reflect real-world vocabularies, we facilitate advancements of matching systems, especially for subsequent mapping and integration tasks.

Evaluating the dataset using OAEI systems revealed significant weaknesses in their handling of SKOS and multilingual data, which shows the significance of our dataset. The evaluation also highlights the dataset's quality, validity, limitations, and lessons learned, offering valuable insights for future benchmark development. ... mehr

DOI: 10.5445/IR/1000178023
Veröffentlicht am 14.01.2025
Zugehörige Institution(en) am KIT Scientific Computing Center (SCC)
Publikationstyp Proceedingsbeitrag
Publikationsdatum 14.01.2025
Sprache Englisch
Identifikator ISSN: 1613-0073
KITopen-ID: 1000178023
HGF-Programm 46.21.02 (POF IV, LK 01) Cross-Domain ATMLs and Research Groups
Weitere HGF-Programme 46.21.05 (POF IV, LK 01) HMC
Erschienen in OM-2024: The 19th International Workshop on Ontology Matching collocated with the 23rd International Semantic Web Conference (ISWC 2024), November 11th, Baltimore, USA
Veranstaltung 19th International Workshop on Ontology Matching (2024), Baltimore, MD, USA, 11.11.2024
Verlag RWTH Aachen
Seiten 1–17
Serie CEUR Workshop Proceedings ; 3897
Projektinformation SFB 980/3 (DFG, DFG KOORD, 202833 (intern))
Bemerkung zur Veröffentlichung Proceedings of the 19th International Workshop on Ontology Matching
Schlagwörter Ontology Matching, Controlled Vocabularies, Reference Dataset, Digital Humanities, OAEI
