KIT | KIT-Bibliothek | Impressum | Datenschutz

Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO)

Coquelin, Daniel ORCID iD icon; Debus, Charlotte; Götz, Markus ORCID iD icon; Lehr, Fabrice von der; Kahn, James ORCID iD icon; Siggel, Martin; Streit, Achim ORCID iD icon

Abstract (englisch):

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the Distributed Asynchronous and Selective Optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.


Volltext §
DOI: 10.5445/IR/1000137220
Veröffentlicht am 06.09.2021
Originalveröffentlichung
DOI: 10.21203/rs.3.rs-832355/v1
Cover der Publikation
Zugehörige Institution(en) am KIT Scientific Computing Center (SCC)
Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp Forschungsbericht/Preprint
Publikationsjahr 2021
Sprache Englisch
Identifikator KITopen-ID: 1000137220
HGF-Programm 46.21.04 (POF IV, LK 01) HAICU
Verlag Springer
Bemerkung zur Veröffentlichung Under review in: Journal of Big Data
Vorab online veröffentlicht am 12.04.2021
Schlagwörter machine learning, neural networks, data parallel training, multi-node, multi-GPU, stale gradients
Nachgewiesen in Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page