KIT | KIT-Bibliothek | Impressum | Datenschutz

Recommending Datasets for Scientific Problem Descriptions

Färber, M. ORCID iD icon; Leisinger, A.-K.


The steadily rising number of datasets is making it increasingly difficult for researchers and practitioners to be aware of all datasets, particularly of the most relevant datasets for a given research problem. To this end, dataset search engines have been proposed. However, they are based on user's keywords and, thus, have difficulty determining precisely fitting datasets for complex research problems. In this paper, we propose a system that recommends suitable datasets based on a given research problem description. The recommendation task is designed as a domain-specific text classification task. As shown in a comprehensive offline evaluation using various state-of-the-art models, as well as 88,000 paper abstracts and 265,000 citation contexts as research problem descriptions, we obtain an F1-score of 0.75. In an additional user study, we show that users in real-world settings are 88% satisfied in all test cases. We therefore see promising future directions for dataset recommendation.

Verlagsausgabe §
DOI: 10.5445/IR/1000140363
Veröffentlicht am 26.11.2021
DOI: 10.1145/3459637.3482166
Zitationen: 6
Zitationen: 5
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2021
Sprache Englisch
Identifikator ISBN: 978-1-4503-8446-9
KITopen-ID: 1000140363
Erschienen in Proceedings of the 30th ACM International Conference on Information & Knowledge Management. Ed.: G. Demartini
Veranstaltung 30th ACM International Conference on Information and Knowledge Management (CIKM 2021), Online, 01.11.2021 – 05.11.2021
Verlag Association for Computing Machinery (ACM)
Seiten 3014-3018
Serie ACM Conferences
Nachgewiesen in Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page