KIT | KIT-Bibliothek | Impressum | Datenschutz

An Empirical Evaluation of Constrained Feature Selection

Bach, Jakob ORCID iD icon 1,2; Zoller, Kolja 3,4; Trittenbach, Holger 1,2; Schulz, Katrin 3,4; Böhm, Klemens 1,2
1 Institut für Programmstrukturen und Datenorganisation (IPD), Karlsruher Institut für Technologie (KIT)
2 Fakultät für Informatik (INFORMATIK), Karlsruher Institut für Technologie (KIT)
3 Institut für Angewandte Materialien – Computational Materials Science (IAM-CMS), Karlsruher Institut für Technologie (KIT)
4 Fakultät für Maschinenbau (MACH), Karlsruher Institut für Technologie (KIT)

Abstract:

While feature selection helps to get smaller and more understandable prediction models, most existing feature-selection techniques do not consider domain knowledge. One way to use domain knowledge is via constraints on sets of selected features. However, the impact of constraints, e.g., on the predictive quality of selected features, is currently unclear. This article is an empirical study that evaluates the impact of propositional and arithmetic constraints on filter feature selection. First, we systematically generate constraints from various types, using datasets from different domains. As expected, constraints tend to decrease the predictive quality of feature sets, but this effect is non-linear. So we observe feature sets both adhering to constraints and with high predictive quality. Second, we study a concrete setting in materials science. This part of our study sheds light on how one can analyze scientific hypotheses with the help of constraints.


Verlagsausgabe §
DOI: 10.5445/IR/1000150015
Veröffentlicht am 17.08.2022
Originalveröffentlichung
DOI: 10.1007/s42979-022-01338-z
Scopus
Zitationen: 1
Dimensions
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Angewandte Materialien – Computational Materials Science (IAM-CMS)
Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Zeitschriftenaufsatz
Publikationsdatum 17.08.2022
Sprache Englisch
Identifikator ISSN: 2661-8907
KITopen-ID: 1000150015
Erschienen in SN Computer Science
Verlag Springer Nature
Band 3
Heft 6
Seiten Art.-Nr.: 445
Schlagwörter Feature selection, Constraints, Domain knowledge, Theory-guided data science
Nachgewiesen in Scopus
Dimensions
Relationen in KITopen
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page