Automatic Feature Engineering through Monte Carlo Tree Search

Huang, Yiran ORCID iD icon 1; Zhou, Yexu 1; Hefenbrock, Michael 1; Riedel, Till ORCID iD icon 1; Fang, Likun ORCID iD icon 1; Beigl, Michael ORCID iD icon 1
1 Institut für Telematik (TM), Karlsruher Institut für Technologie (KIT)


The performance of machine learning models depends heavily on the feature space and feature engineering. Although neural networks have made significant progress in learning latent feature spaces from data, compositional feature engineering through nested feature transformations can reduce model complexity and can be particularly desirable for interpretability. To find suitable transformations automatically, state-of-the-art methods model the feature transformation space by graph structures and use heuristics such as $\epsilon$-greedy to search for them. Such search strategies tend to become less efficient over time because they do not consider the sequential information of the candidate sequences and cannot dynamically adjust the heuristic strategy. To address these shortcomings, we propose a reinforcement learning-based automatic feature engineering method, which we call Monte Carlo tree search Automatic Feature Engineering (mCAFE). We employ a surrogate model that can capture the sequential information contained in the transformation sequence and thus can dynamically adjust the exploration strategy. It balances exploration and exploitation by Thompson sampling and uses a Long Short Term Memory (LSTM) based surrogate model to estimate sequences of promising transformations. ... mehr

Volltext §
DOI: 10.5445/IR/1000152793
Veröffentlicht am 22.11.2022
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Telematik (TM)
Publikationstyp Forschungsbericht/Preprint
Publikationsdatum 25.09.2022
Sprache Englisch
Identifikator KITopen-ID: 1000152793
Verlag arxiv
Schlagwörter data mining, feature engineering monte carlo tree search reinforce learning
