KIT | KIT-Bibliothek | Impressum | Datenschutz

Information Maximizing Curriculum: A Curriculum-Based Approach for Training Mixtures of Experts

Blessing, Denis 1; Celik, Onur 1; Jia, Xiaogang; Reuss, Moritz 1; Li, Maximilian Xiling ORCID iD icon 1; Lioutikov, Rudolf 1; Neumann, Gerhard 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Mixtures of Experts (MoE) are known for their ability to learn complex conditional distributions with multiple modes. However, despite their potential, these models are challenging to train and often tend to produce poor performance, explaining their limited popularity. Our hypothesis is that this under-performance is a result of the commonly utilized maximum likelihood (ML) optimization, which leads to mode averaging and a higher likelihood of getting stuck in local maxima. We propose a novel curriculum-based approach to learning mixture models in which each component of the MoE is able to select its own subset of the training data for learning. This approach allows for independent optimization of each component, resulting in a more modular architecture that enables the addition and deletion of components on the fly, leading to an optimization less susceptible to local optima. The curricula can ignore data-points from modes not represented by the MoE, reducing the mode-averaging problem. To achieve a good data coverage, we couple the optimization of the curricula with a joint entropy objective and optimize a lower bound of this objective. ... mehr


Volltext §
DOI: 10.5445/IR/1000158725
Veröffentlicht am 12.05.2023
Cover der Publikation
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2023
Sprache Englisch
Identifikator KITopen-ID: 1000158725
Verlag arxiv
Umfang 16 S.
Schlagwörter Machine Learning (cs.LG)
Nachgewiesen in Dimensions
arXiv
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page