KIT | KIT-Bibliothek | Impressum | Datenschutz

Using Constraints to Discover Sparse and Alternative Subgroup Descriptions

Bach, Jakob ORCID iD icon

Abstract:

Subgroup-discovery methods allow users to obtain simple descriptions of interesting regions in a dataset. Using constraints in subgroup discovery can enhance interpretability even further. In this article, we focus on two types of constraints: First, we limit the number of features used in subgroup descriptions, making the latter sparse. Second, we propose the novel optimization problem of finding alternative subgroup descriptions, which cover a similar set of data objects as a given subgroup but use different features. We describe how to integrate both constraint types into heuristic subgroup-discovery methods. Further, we propose a novel Satisfiability Modulo Theories (SMT) formulation of subgroup discovery as a white-box optimization problem, which allows solver-based search for subgroups and is open to a variety of constraint types. Additionally, we prove that both constraint types lead to an NP-hard optimization problem. Finally, we employ 27 binary-classification datasets to compare heuristic and solver-based search for unconstrained and constrained subgroup discovery. We observe that heuristic search methods often yield high-quality subgroups within a short runtime, also in scenarios with constraints.


Volltext §
DOI: 10.5445/IR/1000171285
Veröffentlicht am 04.06.2024
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Forschungsbericht/Preprint
Publikationsdatum 03.06.2024
Sprache Englisch
Identifikator KITopen-ID: 1000171285
Verlag arxiv
Umfang 69 S.
Externe Relationen Forschungsdaten/Software
Schlagwörter subgroup discovery, alternatives, constraints, satisfiability modulo theories, explainability, interpretability, XAI
Nachgewiesen in arXiv
Dimensions
Relationen in KITopen
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page