KIT | KIT-Bibliothek | Impressum | Datenschutz

On how data are partitioned in model development and evaluation: Confronting the elephant in the room to enhance model generalization

Maier, Holger R.; Zheng, Feifei; Gupta, Hoshin; Chen, Junyi; Mai, Juliane; Savic, Dragan; Loritz, Ralf 1; Wu, Wenyan; Guo, Danlu; Bennett, Andrew; Jakeman, Anthony; Razavi, Saman; Zhao, Jianshi
1 Institut für Wasser und Gewässerentwicklung (IWG), Karlsruher Institut für Technologie (KIT)

Abstract:

Models play a pivotal role in advancing our understanding of Earth's physical nature and environmental systems, aiding in their efficient planning and management. The accuracy and reliability of these models heavily rely on data, which are generally partitioned into subsets for model development and evaluation. Surprisingly, how this partitioning is done is often not justified, even though it determines what model we end up with, how we assess its performance and what decisions we make based on the resulting model outputs. In this study, we shed light on the paramount importance of meticulously considering data partitioning in the model development and evaluation process, and its significant impact on model generalization. We identify flaws in existing data-splitting approaches and propose a forward-looking strategy to effectively confront the “elephant in the room”, leading to improved model generalization capabilities.


Verlagsausgabe §
DOI: 10.5445/IR/1000161496
Veröffentlicht am 23.08.2023
Originalveröffentlichung
DOI: 10.1016/j.envsoft.2023.105779
Scopus
Zitationen: 15
Web of Science
Zitationen: 8
Dimensions
Zitationen: 15
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Wasser und Gewässerentwicklung (IWG)
Publikationstyp Zeitschriftenaufsatz
Publikationsmonat/-jahr 09.2023
Sprache Englisch
Identifikator ISSN: 1364-8152, 1873-6726
KITopen-ID: 1000161496
Erschienen in Environmental Modelling and Software
Verlag Elsevier
Band 167
Seiten Art.-Nr.: 105779
Vorab online veröffentlicht am 31.07.2023
Schlagwörter Model development, Model evaluation, Data partitioning, Data splitting, Calibration, Validation, Uncertainty, Earth systems
Nachgewiesen in Web of Science
Dimensions
Scopus
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page