KIT | KIT-Bibliothek | Impressum | Datenschutz

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

Komiyama, Junpei ; Fouché, Edouard 1; Honda, Junya
1 Institut für Programmstrukturen und Datenorganisation (IPD), Karlsruher Institut für Technologie (KIT)

Abstract:

We consider nonstationary multi-armed bandit problems where the model parameters of
the arms change over time. We introduce the adaptive resetting bandit (ADR-bandit), a
bandit algorithm class that leverages adaptive windowing techniques from literature on data
streams. We first provide new guarantees on the quality of estimators resulting from adap-
tive windowing techniques, which are of independent interest. Furthermore, we conduct
a finite-time analysis of ADR-bandit in two typical environments: an abrupt environment
where changes occur instantaneously and a gradual environment where changes occur pro-
gressively. We demonstrate that ADR-bandit has nearly optimal performance when abrupt
or gradual changes occur in a coordinated manner that we call global changes. We demon-
strate that forced exploration is unnecessary when we assume such global changes. Unlike
the existing nonstationary bandit algorithms, ADR-bandit has optimal performance in
stationary environments as well as nonstationary environments with global changes. Our
experiments show that the proposed algorithms outperform the existing approaches in syn-
thetic and real-world environments.


Verlagsausgabe §
DOI: 10.5445/IR/1000186945
Veröffentlicht am 17.11.2025
Scopus
Zitationen: 1
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Zeitschriftenaufsatz
Publikationsmonat/-jahr 04.2024
Sprache Englisch
Identifikator ISSN: 1532-4435, 1533-7928
KITopen-ID: 1000186945
Erschienen in Journal of Machine Learning Research
Verlag Journal of Machine Learning Research
Band 25
Seiten 1-56
Nachgewiesen in Scopus
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page